Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
نویسندگان
چکیده
The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement learning with deep learning, called DQN, achieves the best realtime agents thus far. Planning-based approaches achieve far higher scores than the best model-free approaches, but they exploit information that is not available to human players, and they are orders of magnitude slower than needed for real-time play. Our main goal in this work is to build a better real-time Atari game playing agent than DQN. The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play. We proposed new agents based on this idea and show that they outperform DQN.
منابع مشابه
Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games
Monte Carlo Tree Search (MCTS) methods have proven powerful in planning for sequential decision-making problems such as Go and video games, but their performance can be poor when the planning depth and sampling trajectories are limited or when the rewards are sparse. We present an adaptation of PGRD (policy-gradient for rewarddesign) for learning a reward-bonus function to improve UCT (a MCTS a...
متن کاملMonte-Carlo Planning for Pathfinding in Real-Time Strategy Games
In this work, we explore two Monte-Carlo planning approaches: Upper Confidence Tree (UCT) and Rapidlyexploring Random Tree (RRT). These Monte-Carlo planning approaches are applied in a real-time strategy game for solving the path finding problem. The planners are evaluated using a grid-based representation of our game world. The results show that the UCT planner solves the path planning problem...
متن کاملReal-Time Path Planning using a Simulation-Based Markov Decision Process
This paper introduces a novel path planning technique called MCRT which is aimed at non-deterministic, partially known, real-time domains populated with dynamically moving obstacles, such as might be found in a real-time strategy (RTS) game. The technique combines an efficient form of Monte-Carlo tree search with the randomized exploration capabilities of rapidly exploring random tree (RRT) pla...
متن کاملNeurohex: A Deep Q-learning Hex Agent
DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents — e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods — raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after s...
متن کاملLearning to Play Hearthstone Using Machine Learning
The subject of this thesis is a new game called Hearthstone. It is a strategy card game developed by Blizzard Entertainment, in which players duel with each other with cards they collected. The game of Hearthstone provides a challenge for developing an artificial intelligence (AI) agent. The agent has to be able to deal with unknown information and stochastic events in a large search space. In ...
متن کامل